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Abstract 

u. 

Q . Advance reservation is important to guarantee the quality of services of jobs 

by allowing exclusive access to resources over a defined time interval on re- 
O \ sources. It is a challenge for the scheduler to organize available resources 

efficiently and to allocate them for parallel AR jobs with deadline constraint 
. appropriately. This paper provides a slot-based data structure to organize 

Q . available resources of multiprocessor systems in a way that enables efficient 

'nI" \ search and update operations, and formulates a suite of scheduling policies 

Q \ to allocate resources for dynamically arriving AR requests. The performance 

of the scheduling algorithms were investigated by simulations with different 



X 



O ■ job sizes and durations, system loads and scheduhng flexibilities. Simulation 

results show that job sizes and durations, system load and the flexibility of 
scheduling will impact the performance metrics of all the scheduling algo- 
rithms, and the PE WorstFit algorithm becomes the best algorithm for 
the scheduler with the highest acceptance rate of AR requests, and the jobs 
c^_' with the FirstFit algorithm experience the lowest average slowdown. The 

data structure and scheduling policies can be used to organize and allocate 
resources for parallel AR jobs with deadline constraint in large-scale com- 
puting systems. 
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1. Introduction 

Grid-like massive Internet computing platforms have emerged as an essen- 
tial infrastructure for scientific and commercial applications and made it pos- 
sible for "flexible, secure, coordinated resource sharing among dynamic col- 
lections of individuals, institutions, and resources" [1] . In order to guarantee 
the QoS of Grid applications in such environments, advance reservation (AR) 
is incorporated into many Grid systems, such as GARA[1|, Nimrod/Gj2| 
and G-QoSMJSl], and manymainstream parallel scheduler, such as Mauij^, 
load sharing facility(LSF)|5|] and protable batch system(PBS)[6;]. Advance 
reservation makes the execution of grid jobs more predictable by reserving a 
particular resource capability over a defined time interval on local resources, 
and has been widely used for many kinds of jobs, such as single processor 
jobs[7i], parallel jobs [s!], mixed-parallel applications |9|, co-allocation jobs 10 



bag-of-taskspj,], and workfiow[12i]. Advance reservation has been an impor- 
tant area of interest in the Grid community. 

In order to reserve resources in such AR systems, a user must submit a 
request to the system by specifying a set of parameters such as number of 
processing elements needed,ready time, duration and deadline. For such an 
AR request, the job cannot start until its ready time and it must be com- 
pleted by its deadline. Upon receiving such an AR request, it's the task of 
the scheduler to decide if there are sufficient available resources such that the 
request can be completely executed within the interval of its ready time and 
deadline. Considering that there are so many resources and optional alloca- 
tions to check in a large-scale computing system, it's quite a challenge for the 
scheduler to organize available resources efficiently and to allocate resources 
for dynamically arriving AR requests appropriately: the scheduling proce- 
dure itself will impact the ability and efficiency of the scheduler to manage 
resources and to schedule a great number of jobs with various requirements, 
and the scheduling decision will also impact the performance perceived by 
users and service providers. For users, the fraction of AR requests accepted 
by the scheduler and the turnaround time are important measures of how 



well their service requests are treated[l3|, \l^. For clusters, the acceptance 



of new reservations will fragment a continuous range of resource into pieces, 
and thus reduce the potential scheduling opportunities and results in lower 



utihzationjlSl, ll6[. The key challenges here lies in two aspects: (1) to de- 



velop efficient data structure to organize available resources for AR requests 
in a way that enables efficient search and update operations; (2) to develop 
a group of scheduling algorithms or policies that improve the performance 
perceived by users and providers. 

In the literature, many data structures(such as array [[ITJ], linked-list 18 



trees |l9|, |l7|, |20[ and queues |2ll. |22[) and scheduling algorithms (|23|, [l5 



24 



20|) have been proposed for advance reservations and widely studied. How- 
ever, these data structures and scheduling algorithms are only suitable for 
single- or multi-processor AR jobs with immediate deadline constraint. For 
such kind of AR jobs, they must be scheduled to run at their ready time and 
their deadlines are immediate(i.e., deadline=ready time+duration). How- 
ever, if the AR requests are not strictly asked to begin to run at their ready 
times, they can begin to run at any time within its ready time to its latest 
start time(i.e., =deadline-duration). Such kind of AR request is more general 
than those with immediate deadlines, and makes it more flexible and compli- 
cated for the scheduler to organize available resources, to control admission 
and to schedule. As a result, all those existing data structures and schedul- 
ing algorithms for AR requests with immediate deadlines are not suitable for 
these with general deadlines. 

Up to now, only few researches have been done for AR jobs with general 
deadlines. In J25| and [7|, l26[, the authors investigated the problem of how to 
allocate a single-channel (or single-processor) AR job with general deadline 
constraint to n single-processor servers. In those works, because each job 
only needs to be reserved on a single-processor server, it is not necessary 
to allocate more than one idle intervals across multiple servers for them 
simultaneously, thus all the algorithms in those works only considered the 
temporal constraint, without considering the scheduling of AR jobs with 
more than one resources simultaneously. 

Despite the fact that existing data structures and algorithms have been 
widely used for AR requests, they are not suitable for parallel AR jobs 
with general deadlines in large-scale multiprocessor systems. In this paper, 
we investigated the problem of how to manage and allocate multiprocessor 
resources for parallel AR jobs with general deadline constraint. Different 
from existing data structures and scheduling policies designed for scheduling 
single-processor deadline-constraint AR jobs to n single processor servers, or 
for scheduling parallel AR jobs with immediate deadlines to a multiprocessor 
system, in this work we proposed a new data structure and scheduling poli- 
cies to organize the availability of resources in a large-scale multiprocessor 



system and to allocate them appropriately for parallel AR jobs with general 
deadline constraint. The main contribution of this work include: 

• Proposed a new data structure to organize available resources efficiently 
in multiprocessor systems for single- or multiple-processor AR requests 
with immediate or general deadline constrains; 

• Proposed a set of operations for the data structure to enable efficient 
search and update operations; 

• Proposed a set of scheduling policies for the data structure to allocate 
resources for AR requests, and investigated their performance via sim- 
ulation. New scheduhng policies can be added into the data structure 
flexibly. 

The rest of this paper is organized as follows. We discuss related work 
in Section 2 and describe the model for scheduling parallel AR jobs with 
general deadlines in a multiprocessor system in Section 3. In Section 4 we 
introduce a slot-based data structure to organize the availability of resources 
in a way that enables efficient adding, deleting and searching operations. In 
Section 5 we provide a suite of scheduling algorithms for parallel AR requests 
with general deadline constraint and present a comprehensive performance 
evaluation study of the algorithms by simulations, and we conclude the paper 
in Section 6. 

2. Related work 

Many data structures and scheduling algorithms have been proposed 
for advance reservations. Most of them are suitable for AR requests with 
immediate deadlines, and only few of them were specifically designed for 
AR requests with general deadlines. For AR request with immediate dead- 



lines, such data structure as array [17|, linked-hst[l8|, trees[l9|, ll7|, |20[ and 



queues |21l. |22| have already been widely studied. These data structures are 
primarily used for admission control and focused on finding out whether it's 
feasible for the scheduler to accept an AR request to start at a definite time 



and keep on running for a given period. In [22| the author presents a good 
summary and comparison of them when they are used for single- or multi- 
processor AR jobs with immediate deadline constraint. However, they are 
not specifically designed for AR requests with general deadline constraint. 



Based on existing scheduling theory and algorithms for jobs with or with- 
out deadlines, some variants of scheduling algorithms for Jobs with advance 



reservations have been studied in Grid-like systems |23l. |15L \24i |20| and their 
impact on the users and the systems were investigated in terms of turnaround 
time, slowdown, or utilization. 

Different from existing plentiful researches for AR requests with imme- 
diate deadlines, only few works have been done for AR jobs with general 



deadlines. In |25[,the problem of how to reserve optical bursts on wavelength 
channels whose bandwidth may become fragmented with idle intervals was 
proposed. By using concepts from computational geometry,the author maps 
each idle interval and each burst as a point on a two-dimensional plane, then 
the points for idle intervals were organized into a search tree and several al- 
gorithms, such as Min-SV, Max-SV, Min-Ev, Max-EV and Best-fit, were pro- 
posed for reserving bursts with and without fiber delay lines. Based on 
the concept and algorithms in |25|, in |7|, |26| the author adapted them for 
scheduling single-processor AR jobs with general deadline constraint to n 
single-processor servers. In those works, because each job only needs to be 
reserved on a single-processor server, it is not necessary to allocate more 
than one idle intervals across multiple servers for them simultaneously, thus 
all the algorithms in those works only considered the temporal constraint, 
without considering the scheduling of AR jobs with more than one resources 
simultaneously. Moreover, for different scheduling policies in JTl, l26| , the data 
structure used for storing the availability information of the resources and 
the method for finding out the appropriate interval are different. This lim- 
its the flexibility of the data structures to support new scheduling policies. 
In contrast, the data structure proposed in this paper can support different 
scheduling policies flexibly, without changing the data structure itself and 
the method of finding appropriate resources for the requests. 

3. Problem description 

The computing environment is a parallel system, e.g., clusters or mas- 
sively parallel processing machines, consisting of a group of space-shared 
processing elements {PEi,PE2, ...,PEn}, with the total number of n. For 
simplicity, we assume the PEs are homogeneous. Each machine has a local 
resource management system capable of supporting advance reservations for 
local or external jobs. Figure [1] shows an example schedule of a parallel AR 
job with general deadline constraint. Assume the request of the AR job ar- 



rives at to- The request asks the scheduler to allocate PEs during the ready 
time and the deadline so as the job can run for its duration. On receiving 
this request, the scheduler will evaluate whether they are enough resources 
available for the job so as to meet its deadline. If so, the scheduler will al- 
locate and reserve them for the job; otherwise, the request will be declined. 
Moreover, if there are more than one allocations that can satisfy the request 
at the same time, only one of them will be chosen based on some criteria or 
policies. 

In this paper, each AR request with deadline is characterized by a five- 
parameter tuple (ta, tr, trfu, trf/,npe), where: 

1. ta is the arrival time of the request; 

2. tr{> to) is the ready time, i.e., the earliest start time of the job. When 
tr > ta is permitted, advance reservations are supported by the sched- 
uler; Otherwise, only immediate reservations are permitted; 

3. tdu is the duration of the job, i.e., the amount of time needed by the 
job when running on current cluster; 

4. tdi{> tr + tdu) is the deadline, i.e., the latest time by which current 
job must be completed. If tdi = tr + tdu, the deadhne is immediate 
and we refer to this problem as scheduling with immediate deadline. If 
tdi > tr + tdu, the deadline is general and we refer to this problem as 
scheduling with general deadline; and 

5. Upe is the number of PEs required by the request. 

In Figure [H at to, there are respectively two running jobs(jobl and job2) 
and one reserved job(job3) on the cluster. The scheduler can try to allocate 
the job to start at any time from the ready time (i.e., t2) to the latest start 
time, i.e., t7(= tdi ~ tdu), and then check whether there are enough PEs for 
the job to begin to run at the selected start time for tdu- 

In this paper, we assume all AR jobs arrive dynamically and they are 
non-preemptive and non-malleable, i.e., they must run till completion once 
they start execution and their requirements on resources, such as the number 
of PEs, can not be changed. Compared with preemptive and malleable AR 
jobs(l5|, such kind of non-preemptive and non-malleable AR jobs are more 
difficult to tackle for the scheduler. It appears to be NP-complete to schedule 
them under deadline constraint even for very restricted cases, and there are 



not optimal online scheduling algorithms for them[27[. In order to schedule 



this kind of AR jobs, heuristics are left for the scheduler. 
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Figure 1: An example of reserving processing elements for a new advance reservation 
request with general deadline constraint. Assume the new AR request arrives at tO, when 
there are 2 running jobs(jobl and job2) and one reserved job(job3), and the AR request 
needs n processing elements and should be processed within its ready time(t2) and its 
deadline(i9). Four feasible allocation for the AR request are illustrated. 



In the scheduling, both of the procedure to decide and the decision itself 
made by the scheduler are important. Because there are so many PEs and 
optional start times to check, the scheduling procedure itself will impact the 
ability and efficiency of the scheduler to manage a large-scale distributed 
resources and to schedule a great number of jobs with various requirements, 
and the scheduling decision will also impact the performance perceived by 
users and service providers. In the following sections, an efficient data struc- 
ture and operations, and a suite of scheduling strategies will be proposed to 
manage and allocate resources for AR requests with deadline. 



4. Data Structure and operations 



28 1, for each cluster, we 



Similar to the variable slot data structure in 
represent the resources allocated for each running or reserved job as an rect- 
angle and record the availability of a cluster as a set of {time, PEs} pairs, 
where time means at which the state(i.e., busy or idle) of the PEs change, 
and PEs means the identities of the PEs who are busy at time. If PEs is 
null(0), it means at that time all busy PEs recorded in the previous time slot 
are set free. 

In order to record and manage the identity of PEs occupied in every 
time period, a linked list-based data structure AvailRectList was proposed. 



When a new job is allocated with it start time(ts), it end time(te) and its 
PEs{PEjoh), the records within the interval [tg, t^) will be updated by adding 
PEjob to their PEs, inferring that PEjob will be used by the job from tg 
to te- Accordingly, when the job is completed, PEjob will be released and 
subtracted from the records within tg to te- In this way, at any time, the PEs 
already occupied for running or reserved jobs are known, and we can check 
for the availability of PEs in any given time interval. 

Additionally, to simplify the process of querying the time slots at which 
the states of PEs change and the availability of PEs, an auxiliary sorted set- 
based structure, TimeSet, was used. As the records in AvailRectList change, 
TimeSet will be updated synchronously. 

In order to support advance reservations with deadline, the data structure 
needs to perform three basic operations: adding an allocation, deleting an 
allocation and searching for available allocations for AR requests. 

4-1. Adding and deleting an allocation 

Before adding an allocation, we assume a search operation(see Subsec- 
tion 14. 2p has already been done and the start time(ts), the end time(te) 
and the PEs of the ]oh{PEjob) have already been allocated. The adding 
operation is described in Algorithm add AUocation(ts,te, PEjob)- The core 
of this operation is to update the records in the interval [ts,te) by adding 
PEjob to their PEs. If AvailRectList is empty or the earliest time of the 
records is greater than te, just need to add {tg, PEjob} and {te, 0} into Avail- 
RectList ^Otherwise, we should find all records in the interval [ts,te) and up- 
date their PEs by adding them with P_Ej of, ( line HHS]). After updating, it is 
possible that the PEs of the record of tg or te become the same as that of the 
record of the time slot just before, or that tg is the earliest time slot and the 
PEs of the record of tg are null. In such cases, the records of tg and/or te are 
redundant and should be cleaned((line [7]). When a job finishes, a deleting 
operation deleteAUocation{tg,te, PEjob) is called immediately. It applies to 
the same principle as adding a new one but to update the records in the 
interval [ts,te) by subtracting PEjob from their PEs. 

The complexity of the addAllocation() or the deleteAllocation() operation 
is analyzed as follows. Suppose there are n records in AvailRectList. For 
the add Allocation or the delete Allocation () operation, we need to update 
the records within \tg,te). Assume there are n' records within \tg,te) and k 
PEs will be updated in each record. It will take 0{n' * logn) time to find n' 
time slots by searching TimeSet, take 0{n) time to find the record for each 

8 



Algorithm 1: addAllocaXionltg, te,PE 



job) 



1 if AvailRectList is empty OR TimeSet. first > te then 

2 I AvailRectList.addaU{{ts,PEjoi,},{te,0})', 

3 else 

4 find all records within [ts,te) in AvailRectList; 

5 update the PEs of the records found by adding them with PEjob] 

6 end 

7 clean possible redundant records; 



Algorithm 2: deleteAllocation(ts,te,-P-E' 



job) 



1 find all records within [ts,te) in AvailRectList; 

2 update the PEs of the records found by subtracting them with PEjob; 

3 clean possible redundant records; 



time slot in the linked list and 0{k) time to update k PEs for each record. 
After updating the records of tg and te, it will take 0(1) time to remove them 
if they are redundant. Thus the overall complexity of finding and updating 
n' records will take 0{n' * {n + k + logn)) time. 

4-2. Search feasible allocation 

When a new AR request arrives, this operation is performed to check 
whether there are enough PEs to be allocated for the job. If so, the operation 
will choose and return the identities of allocated PEs and the start time for 
the job; Otherwise, it will return null, inferring that there are not enough PEs 
for the request. This operation is defined as findAUocation(tr, t^u, tdu ^^-pe, policy) 
and is shown in Algorithm [31 where policy is the scheduling policy used to 
choose available PEs and runtime intervals for the request (see Section [5]). 
If AvailRectList is empty, the operation will allocate the request to start at 
tr and allocate n PEs for it(line [2ll3]); Otherwise, the operation will search 
for feasible start times(line|S]), get the maximum availability rectangle of ev- 
ery start time (line (MHl) and add them into availRect; If finally availRect is 
not empty, inferring there are feasible allocations for the request, the oper- 
ation will choose an appropriate start time and allocate PEs for the request 
according to the scheduling policyfline lTTj) . 

Notably, any time slot in the interval [tr,tdi\ may be an optional start 
time for the request. This makes it a hard work to check the availability 

9 



rectangles of resource related to these start times. To simplify this operation 
and to minimize the possible fragmentation of resources resulting from AR 
allocations, in the operation of line |5l it's suggested to check existing time 
slots only in the interval [tr,tdi] and new ones generated by deducting these 
existing time slots with tdu- For every optional start time tg, the operation 
gets free PEs(i.e., PEfree), in the interval [ts,ts + tdu)- This can be done by 
iterating through AvailRectList. If the number of free PEs is not less than 
nPE, indicating ts is a feasible start time, and the operation will find the 
maximum availability rectangle containing PEfree and the interval [ts,ts + 
tdu) (line [7]). Finally, after constructing availability rectangles for all feasible 
start times, the operation will choose one of them according to policy, and 
return the appropriate start time and rijob PEs for the request. 

Algorithm 3: findAllocation(tj.,tdu,tdi,njob,policy) 
1 if AvailRectList is empty then 



Let the job to run at tr and allocate n PEs for it;; 
return {tr, IDs of the n PEs}; 



4 else 



5 
6 

7 

8 

9 

10 

11 



12 



find all feasible start times {ST} within [tr,tdi — t^u]', 
foreach element tg in ST do 

find the maximum availability resource rectangle 

X''- begin: -i- end: -'^^-^ free J OI ^s; 
aVailRect.add{{ts, Tbegin, Tend, PEjree})] 

end 

if availRect is not empty then 

choose the appropriate tg and Ujob PEs according to policy 
from availability resource rectangles, and return 
{ts, IDs of the Ujob PEs}; 

else return null; 



13 end 



The complexity of findAllocation() is as follows. If AvailRectList is 
empty, the request will be allocated to run at t^ immediately. This will take 
0(1) time; Otherwise, we can sort the linked list into sorted array list(this will 
take 0{nlogn)), and assume there are p feasible start times within \tr,t(ii\. 
For each feasible start time, assume there are u free PEs in its maximum 
availability rectangle and v neighboring records should be checked to deter- 
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mine the maximum availability rectangle, it will take 0{u*v). After getting 
the maximum availability rectangle of each feasible start time, the informa- 
tion of the rectangles will be used to build a priority queue (0(p)), in which 
the selected rectangle according to policy will always be in the root(0(l)). 
Finally, a group of rijob free PEs will be chosen from the selected rectangle 
with u free PEs and allocated to the request (0(r2j„5 * logu)). Overall, the 
complexity of searching and allocating resources for the request will take 
0{p * u* V + nlogn + rijob * logu + p). 

The following example illustrates a typical application of these opera- 
tions in Figure [H At to, there are two running jobs and one reserved job. 
Assume the running jobs begin to run at to and the records in AvailRectList 
are {to,nl + n2 PEs},{ti,nl PEs},{t3,0},{ts,n3 PEs} and {tio,0}. The 
following steps illustrate the actions of the above operations. 

(l)When a new AR request {t2,t4 — t2,t9,n} arrives, the scheduler calls 
findAllocation(t2iti — t2,tg,n, policy) to find available start times and free 
PEs for the job. Theoretically, it's optional for the AR job to start at any 
time slots within from the ready time t2 to the latest start time ty. However, 
we only choose t2,t3,t6 and ty as feasible start times and neglect any other 
slots. In this way, we can simplify the searching operation and lower the 
influence of AR requests on fragmenting resources. 

(2) Fortunately, there are enough free PEs for the AR request to begin 
to run at any of the four start times, and findAllocation() will calculate 
the maximum availability rectangle for every start time and choose one of 
them for the AR request. For t2,the number of free PEs within the interval 
[t2,t4) are N — nl, and the beginning slot and ending slot of the maximum 
availability rectangle with N — nl free PEs are ti and tg. For ts, the number 
of free PEs within the interval [t3,t5) are A^, and the beginning slot and 
ending slot of the maximum availability rectangle with N free PEs are ta 
and ts- In this way, we can get the numbers of free PEs and the beginning 
slots and ending slots of the maximum availability rectangles of te and ty 
respectively. Assume policy is PE Worst Fit{see Sectior^ and ts is chosen 
as the start time and n PEs will be allocated for the request, the operation 
will return {t3,n PEs}. 

(3)After getting the start time and PEs for the AR request, the sched- 
uler will call addAllocation(t3,t5,n PEs) to add the reservation into Avail- 
RectList. At first,the adding operation updates {t3,0} to {t3,n PEs}, and 
inserts {t^,0} into AvailRectList. Because{ti,nl PEs} is the exactly pre- 
vious record of {t3,n PEs} in AvailRectList and the nl PEs of ti are the 

11 



same as the n PEs of ^3, {^3,72 PEs} will be merged with {ti,nl PEs} and 
removed from AvailRectList . 

(4)At ti, job2 finishes, and deleteAUocation{to,ti,n2PEs) will be called 
to subtract n2 PEs from the records within the interval [to,tl). The original 
record of to will change from {to, nl + n2 PEs} to {to, nlPEs}, and the origi- 
nal record of ti, i.e., {ti,nl PEs}, will be merged with the new {to, nlPEs} 
and then be removed. Finally the remaining records in AvailRectList are 
{ti,nlPEs}, {t5,0}, {ts,n3PEs} and {tio,0}. 

5. Scheduling algorithms 

If there are more than one allocations that can satisfy the request at the 
same time, the scheduler will choose one of them based on some criteria. 
Considering feasible start times themselves and their maximum availability 
rectangles, we have developed following scheduling strategies to control the 
allocation of resources for AR requests. 

First Fit(FF): the job is allocated to run at the earliest feasible start time. 

PE Best Fit(PE_B): the job is allocated to run at the feasible start time 
with the minimum number of free PEs. 

Duration Best Fit(Du_B): the job is allocated to run at the feasible start 
time, the availability rectangle of which has the minimum duration. 

PE-Duration Best Fit(PEDu_B): the job is allocated to run at the start 
time, the availability rectangle of which has the minimum production 
of the number of free PEs and the duration. 

Different from PE_B,DU_B or PEDU_B that tries to choose feasible 
start time the availability rectangle of which has the minimum number of 
free PEs, duration or production, we can also construct their corresponding 
maximum versions, i.e., the PE Worst Fit (PE_W) algorithm, the Du- 
ration Worst Fit (Du_W) algorithm and the PE-Duration Worst Fit 
(PEDu_W) algorithm. 

In practice, it's possible that more than one feasible start times have the 
same availability rectangle. For example, in Figure [H t3 and tg have the same 
availability rectangle, which has N free PEs within t^ and tg. In such cases, 
if the maximum availability rectangle was chosen for the request, the earliest 
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feasible start time will be chosen, so as to shorten the waiting time of the 
job. e.g., in Figure [H ^3, instead of tg; will be chosen by the scheduler when 
Du_B or PE_W is used. 

6. Performance evaluation 

In order to verify the data structure and its operations, and to evaluate 
the performance of the scheduling strategies, we implemented the data struc- 
ture and its operations in a discrete event-driven simulator, applied these 
strategies to schedule AR requests, and analyzed their performance metrics. 



The simulator is implemented on the basis of SimJava[32j, which is a 
process based discrete-event simulation package for Java and is originally de- 
veloped by University of Edinburgh. For its accuracy in simulation, SimJava 
is widely used to build simulators in many researches. A SimJava simulation 
is a collection of entities each running in its own thread and they are con- 
nected together by ports and can communicate with each other by sending 
and receiving events. A central system class controls all the entities, ad- 
vances the simulation time, and delivers the events. In our simulator, we 
implemented a hierarchal architecture to model cluster or grid-like comput- 
ing environments and to evaluate the operation and performance of different 
resource management strategies. The simulator includes entities such as 
meta-users, meta-schedulers and multiprocessor systems. A meta-user is re- 
sponsible for generating AR requests and submit them to the job queue of 
the meta-scheduler, and the meta-scheduler links to multiprocessor entities 
and manages their availability information via the data structure proposed 
in this paper and allocate resources according to the scheduling policies. In 
a multiprocessor entity, a local scheduler entity and multiple processing el- 
ement entities were created and they are responsible for processing the AR 
request submitted by the meta-scheduler. 

6.1. Simulation environments 

For experiments based on discrete event-based simulation, a workload is 
needed to drive the simulation. However, there are not any workload traces 
about advance reservation can be used in this paper directly. In this paper, 
the LANL-CM5 in Parallel Workload ArchivelSO] and the Feitelson-Lublin 



model[31| were considered to generate AR jobs with deadline constraints. 
The LANL-CM5 is a 1024-node Connection Machine CM-5 system and pro- 
cessors are allocated only in powers of 2, with the minimal partition size 
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and the maximal partition size being 32 and 1024 processors respectively. In 
experiments, the distributions and parameters used in the Feitelson-Lublin 
model to generate workload were set according to the LANL-CM5 values in 



31[ , following models and parameters were used to control the generating of 
jobs: 

(l)The combined model of arrival process in the Feitelson-Lublin model 
and its parameters for LANL-CM5 were used to control the arrival of jobs. 

(2)The two-stage uniform distribution with parameters ULow,UMed,UHi 
and Uprob was used to control the sizes of jobs generated. In this distri- 
bution, all jobs are parallel, i.e., the probability for serial jobs are 0, and their 
sizes are power of 2, with the minimal size of 32(i.e.,ULow=4.5), the maximal 
size of 1024(i.e.,UHi=10) and Uprob = 0.82. For the original LANL-CM5 
log,UMed is 7. In order to control the sizes of the jobs generated, UMed was 
set to be 5, 6, 7, 8, and 9 individually in experiments. As UMed changes 
from 5 to 9, the mean size of jobs increases. 

(3)Runtime is an important characteristic of a rigid job. In the Feitelson- 
Lublin model, the hyper-Gamma distribution was used to model runtimes 
and a group of parameters were verified to be appealing and representative 
for each and all workloads. Although the resulting runtimes in this model 
are discrete, the distribution of which is very different from the distribution 
of sizes and spans a very large range of values. In the interest of efficient 
computability and representability, we made minor modifications for this 
model to only generate runtime values of 60, 300, 900, 1800, 3600 and 10800. 
The distribution of these new runtime values were determined by comparing 
the distribution of estimated runtimes in the original LANL-CM5 records and 
the distribution of runtimes generated in the model. Moreover, as the size and 
the runtime of a job are correlated, when UMed changes, the distributions of 
sizes and runtimes will change. In this way, we can evaluate the performance 
of different scheduling algorithms as the distributions of sizes and runtimes 
of jobs change. 

By using the the LANL-CM5 workload and the Feitelson-Lublin model, 
we can generate a series of jobs, each with arrival time, size and duration. 
In order to add deadline and advance reservation constraints to the resulting 
jobs, two factors were used: 

• artime factor (> 0): is used to control the period between the arrival 
time ta and the ready time tr of an AR request. The period is defined as 
artime f actor *U[0, 1] ^t^^, where f/[0, 1] is a random number uniformly 
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distributed in [0, 1].. This parameter is set based on [33 



• deadline factor (> 0): is used to control the job's deadline, which is 
defined as t^ + (1 + deadline factor * U[0, 1]) * t^u- If this factor is zero, 
the deadline is immediate, i.e., tdi = tr + tdu', Otherwise, the deadline 
is general, i.e., tdi > U + tdu- 

With these parameters, we can generate jobs with deadlines and advance 
reservation requests from a workload trace. For AR jobs, as the values of 
artime factor and deadline factor increase, the flexibility of scheduling will 
increase, and the resource competition between AR jobs will be alleviated. 
Based on the influence of these parameters on AR jobs, these factors were 
combined together as {artime factor, deadline factor} , and flve pairs of values, 
i.e., {1, 1},{2, 2}, {3, 3}, {4, 4} and {5, 5}, were used to generate low-, middle- 
and high-flexibility AR jobs. 

In order to generate workloads with different distribution of inter-arrival 
times and further to investigate the performance of strategies under different 
system load, arrival factor(af in short) is deflned and used as follows: for 
a job in a given workload with arrival time t^o, its new arrival time will be 
tso/ arrival factor. In this way, we can control the arrival of jobs and thus 
control system load. In experiments, a/ = 1 is set as default. 

In experiments, following two metrics were used to evaluate the perfor- 
mance of different scheduling strategies: 

(1) Acceptance rate: is the percentage of reservations that are accepted 
because their requirements can be satisfled, which indicates the ability to 
accommodate AR request. 

{2) Average slowdown: the slowdown of an AR job is the response time of 
the job normalized by the running time, i.e., [waiting time+runtime) / runtime, 
where waiting time is the difference between the ready time and the actual 
start time. This measures how much slower the job ran due to conflicts with 
other competing jobs and it seems more reasonable than the waiting time 
to capture the user's expectation that a job's waiting time will be propor- 
tional to its runtime. Average slowdown is the average value of slowdowns of 
all accepted AR jobs, which indicate how well the scheduling algorithm can 
satisfy the user's expectations on the execution of the job. 

6.2. Experimental results 

In experiments, we investigated the performance of the scheduling strate- 
gies against different job sizes and durations, different arrival factors and dif- 
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ferent {artime factor, deadline factor} values. For each experiment, 10^ jobs 
were submitted to the scheduler for the results, and we have obtained 95% 
confidence intervals for them. 

6.2.1. Results for different job sizes and durations 

Figure |2] and Figure [3] present the acceptance rate and the average slow- 
down of different scheduling strategies for workloads with different UMed 
values respectively. In experiments, arrival factor is 1 and {artime fac- 
tor, deadline factor} is {3,3} as default. As shown in Figure |2l when UMed 
changes from low to middle and high, the acceptance rates of all strategies 
decrease gradually. This result is in agreement with intuition: as UMed 
increases, the mean size and the mean duration of jobs increase, thus de- 
manding more resources to accommodate them and intensifying competition 
of available resources between jobs. 

In Figure [2l there are three groups of algorithm with almost identical be- 
havior: PE_W and Du_B, PEDu^ and PEDuJV, and PE_B and DuW. 
Among them, except the PE_B algorithm and the DuJV algorithm, all 
other four strategies outperform FF. Moreover, the PE_W algorithm and 
the DuJ3 algorithm in the first group perform much better than FF and 
clearly become the best strategies for all UMed values. Notably, except for 
the almost identical behavior of PEDuJ3 and PEDu-W, the performance of 
PEJ3 and PE_W, and Du_B and DuJW are quite different. Based on this 
results, we cannot draw a deterministic conclusion that PE-based strategies 
are better or worse than duration-based ones, or best fit-based strategies are 
always better or worse than equivalent worst fit-based ones. This can be 
explained by the fact that, for an idle period of resource,the infiuences of the 
number of its PEs and its duration on accommodating new jobs are different. 

Now turn to Figure [31 which plots the average slowdown of different 
strategies for workload with different UMed values. When UMed changes 
from low to middle and high, the average slowdown of all strategies increase 
in general. This can be explained by the fact: as UMed increases, the mean 
size and the mean duration of jobs increase. For a new job, no matter under 
which algorithm, it will experience a longer waiting time before execution. As 
we can see, the jobs with FF experience the lowest average slowdown. This 
can be easily explained that FF always chooses the earliest feasible period 
and thus minimizing the waiting time of jobs. For the other strategies, the 
performance of PE_W and DuJ3, and PEJ3 and DuJW are similar again 
as in Figure |2l However, the performance of PEDu_B and PEDu-W are 
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surprisingly different. 
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Figure 2: Acceptance rate vs job size control parameter UMed 



6.2.2. Results for different system load 

In both |3l| and the aforementioned experiments, UMed is typically set 
to 7. In following experiments, we will investigate the performance of the 
strategies against different arrival factor values with UMed=7 and {artime 
factor, deadline factor}={3, 3}. 

Figure S] and |5] illustrate the acceptance rate and the average slowdown 
of the strategies as arrival factor changes from 0.5 to 1.5, in step of 0.25. As 
expected, as the value of arrival factor increases, acceptance rates and slow- 
downs of all strategies degrade in both Figures. This agrees with the fact that 
as the value of arrival factor increases, the number of AR requests submitted 
within a given period will increase,thus the competition of resources among 
jobs will intensify,and the acceptance rate will decrease. For accepted AR 
requests, they also tend to experience longer waiting time, for there will be 
more jobs allocated in their expected execution periods as the value of arrival 
factor increases. 

By comparing the results in Figure |2] and H] and the results in Figure [3] 
and |5] respectively, it can be seen clearly that the relative performance of the 
scheduling algorithms in both experiments are similar. Based on the results 
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Figure 3: Slowdown vs job size control parameter UMed 



in Figure [2H5l we can conclude that job sizes and durations and system load 
will impact the performance metrics of all the scheduling algorithms and the 
performance perceived by the users clearly: as job sizes and durations and 
system load increase, the acceptance rate and the average slowdown for all 
algorithms will degrade, and AR jobs will experience lower acceptance rate 
and higher waiting time. 

6.2.3. Results for different scheduling flexibilities 

Figure Eland [7] present the acceptance rate and average slowdown of differ- 
ent scheduling algorithm when the values of {artime factor, deadline factor} 
change. As shown in Figure |6l when the values of {artime factor, deadline 
factor} change from low to middle and high, the acceptance rates of PE_W, 
Du-B and PEDu-B increase almost linearly. This behavior indicates that 
their acceptance ability are stable throughout the range of flexibilities con- 
sidered in this study.However, the performance improvements of other four 
strategies are not stable, especially at {4, 4}, indicating that they are sensitive 
to the degree of scheduling flexibility. Among all strategies, PE_W become 
the best algorithm again with the highest acceptance rate. It presents better 
performance than DuJ3 as the flexibility of scheduling increases and defeats 
other strategies easily throughout the range of values. 
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Figure 4: Acceptance rate vs arrival factor with UMed=7 
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Figure [Tlpresents the average slowdowns of strategies with different schedul- 
ing flexibilities. As the scheduling flexibility increase, the average slowdowns 
of all strategies increase, which agrees with the intuition that the more flexi- 
bility an AR request has in scheduling, the longer the waiting time and larger 
slowdown will be. Moreover, the relative performance of the curves are simi- 
lar to the others observed earlier: FF is always the one with smallest values 
of slowdown by allocating AR jobs to run as early as possible. 
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Figure 6: Acceptance rate vs {artime factor, deadline factor} with UMed=7 



7. Conclusions and discussions 

In this paper,we discuss about the scheduling model and algorithms for 
parallel AR jobs with deadline. We proposed a new data structure and a 
set of operations to organize the availability of multiprocessor systems for 
single- and/or multiple-processor advance reservation requests with imme- 
diate or general deadline constrains in a way that enables efficient search 
and update operations, formulated a suite of scheduling policies for the data 
structure to allocate resources for AR requests, and investigated their per- 
formance via simulation. Based on a comprehensive performance evaluation 
study of the scheduhng pohcies with simulation, it's shown that job sizes 
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Figure 7: Slowdown vs {artime factor, deadline/ actor} with UMed=7 

and durations, system load and the flexibility of scheduling will impact the 
performance metrics of all the scheduling algorithms. Among them, the 
PE Worst Fit algorithm becomes the best algorithm for the scheduler with 
the highest acceptance rate of AR requests, and the jobs with the First Fit 
algorithm experience the lowest average slowdown. The simulator and the 
simulations verified that the data structure, its operations and the schedul- 
ing policies are efficient and effective in such computing environments, and 
can be used in practice. Moreover, because the data structure can support 
different scheduling policies in a flexible way, other scheduling policies can 
be easily integrated in the system. 

In the research of the data structure, its operations and the scheduling 
policies, we assume that the resources are homogeneous and the jobs are 
rigid. However, They can be extended to support heterogeneous resources 
and malleable jobs in the future. If the system is heterogeneous, i.e., the ca- 
pacities of the resources in the system are not the same, we can standardize 
the capacities of the resources and the requirements of the jobs by using 
a 'standard' resource. In this way, the capacity of each resource and the 
requirement of each job are described by referring to the standard resource, 
and we can organize the 'standardized' capacities of the resources in the data 
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structure, and allocate 'standardized resources' for the jobs with 'standard- 
ized' requirements. On the other hand, for malleable jobs, their requirements 
on the number of PEs and durations are not fixed. If a malleable job's re- 
quirement of the number of PEs changes, its duration will change along. 
However, in the findAUocation{tr,tdu,tdi,njob, policy) algorithm in SubSec- 
tion 14. 2^ the number of PEs(i.e., rijob) and the time-related constraints(i.e., 
tr,tdU,tdl) are rigid. To support malleable jobs, the malleable requirements 
on the number of PEs and time-related parameters of a job should be 'trans- 
lated' into a group of rigid ones, then those rigid parameters can be used to 
find resources for the jobs by using the findAUocation(tr, tdu, tdi, njob, policy) 
algorithm. Additionally, some new criteria should be designed to choose an 
allocation for the malleable job among the group of rigid parameters. How to 
'translate' the requirements of a malleable AR job with deadline constraint 
into a group of rigid parameters is also a problem to be considered. In the 
future, we plan to investigate the problems for heterogeneous resources and 
malleable jobs in more detail. 
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